Design of FastQuery How to Generalize Indexing and Querying System for Scientific Data

نویسندگان

  • Jerry Chou
  • Kesheng Wu
چکیده

Modern scientific datasets present numerous data management and analysis challenges. State-of-the-art index and query technologies such as FastBit are critical for facilitating interactive exploration of large datasets. These technologies rely on adding auxiliary information to existing datasets to accelerate query processing. To use these indices, we need to match the relational data model used by the indexing systems with the array data model used by most scientific data, and to provide an efficient input and output layer for reading and writing the indices. In this work, we present a flexible design that can be easily applied to most scientific data formats. We demonstrate this flexibility by applying it to two of the most commonly used scientific data formats, HDF5 and NetCDF. We present two case studies using simulation data from the particle accelerator and climate simulation communities. To demonstrate the effectiveness of the new design, we also present a detailed performance study using both synthetic and real scientific workloads.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

FastQuery: A General Indexing and Querying System for Scientific Data

Approaches: · Develop FastQuery[1], a generalized parallel index and query system for scientific data. · Use FastBit[6,7], a start-of-the-art index and query technology. · Allow array specification for querying and indexing data, such as buildIndex(“data[0,:]”), getNumHits(“data[0,:]>0”). · Deploy a flexible yet simple variable naming scheme based on regular expression. · Define a unified array...

متن کامل

Developing a BIM-based Spatial Ontology for Semantic Querying of 3D Property Information

With the growing dominance of complex and multi-level urban structures, current cadastral systems, which are often developed based on 2D representations, are not capable of providing unambiguous spatial information about urban properties. Therefore, the concept of 3D cadastre is proposed to support 3D digital representation of land and properties and facilitate the communication of legal owners...

متن کامل

Lightweight Indexing of Observational Data in Log-Structured Storage

Huge amounts of data are being generated by sensing devices every day, recording the status of objects and the environment. Such observational data is widely used in scientific research. As the capabilities of sensors keep improving, the data produced are drastically expanding in precision and quantity, making it a write-intensive domain. Log-structured storage is capable of providing high writ...

متن کامل

Scalable Run-time Data Indexing and Querying for Scientific Simulations

Scientific simulations running at scale on highend computing systems are generating tremendous amounts of raw data, which has to be carefully analyzed before scientists can derive insights from simulations and better understand the phenomena being modeled. Query-driven data analysis is an important technique used by scientists to gain insights from data, especially to capture intermittent trans...

متن کامل

Towards a Microblogs Data Management System (Invited Industrial Paper)

This paper advocates for the need to build a Microblogs Data Management System (MDMS) as an end-toend data management system to support indexing, querying, and analyzing microblogs, e.g., tweets, comments, or check-in’s. We identify a set of characteristics for microblogging environments that are distinguishing from any other data management environment. Then, we propose a system architecture f...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011